Techniques for Shared Resource Management in Systems with Throughput Processors
نویسنده
چکیده
The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime example of throughput processors that can deliver high performance for applications ranging from typical graphics applications to general-purpose data parallel (GPGPU) applications. However, this success has been accompanied by new performance bottlenecks throughout the memory hierarchy of GPU-based systems. This dissertation identifies and eliminates performance bottlenecks caused by major sources of interference throughout the memory hierarchy. Specifically, we provide an in-depth analysis of interand intra-application as well as interaddress-space interference that significantly degrade the performance and efficiency of GPU-based systems. To minimize such interference, we introduce changes to the memory hierarchy for systems with GPUs that allow the memory hierarchy to be aware of both CPU and GPU applications’ characteristics. We introduce mechanisms to dynamically analyze different applications’ characteristics and propose four major changes throughout the memory hierarchy. First, we introduce Memory Divergence Correction (MeDiC), a cache management mechanism that mitigates intra-application interference in GPGPU applications by allowing the shared L2 cache and the memory controller to be aware of the GPU’s warp-level memory divergence characteristics. MeDiC uses this warp-level memory divergence information to give more cache space and more memory bandwidth to warps that benefit most from utilizing such resources. Our evaluations show that MeDiC significantly outperforms multiple state-of-the-art caching policies proposed for GPUs. Second, we introduce the Staged Memory Scheduler (SMS), an application-aware CPU-GPU memory request scheduler that mitigates inter-application interference in heterogeneous CPU-GPU systems. SMS creates a fundamentally new approach to memory controller design that decouples the memory controller into three significantly simpler structures, each of which has a separate task, These structures operate together to greatly improve both system performance and fairness. Our three-stage memory controller first groups requests based on row-buffer locality. This grouping allows the second stage to focus on inter-application scheduling decisions. These two stages enforce high-level policies regarding performance and fairness. As a result, the last stage is simple logic that deals only with the low-level DRAM commands and timing. SMS is also configurable: it allows the system software to trade off between the quality of service provided to the CPU versus GPU applications. Our evaluations show that SMS not only reduces inter-application interference caused by the GPU, thereby improving heterogeneous system performance, but also provides better scalability and power efficiency compared to multiple state-of-the-art memory schedulers.
منابع مشابه
Lock-based Resource Sharing in Real-time Multiprocessor Platforms
Embedded systems are typically resource constrained, i.e., resources such as processors, I/O devices, shared buffers or shared memory can be limited for tasks in the system. Therefore, techniques that enable an efficient usage of such resources are of great importance. Looking at software in industrial systems, large and complex software systems are often divided into smaller parts (application...
متن کاملModeling and Performance Evaluation of Multi-Processors Organization with Shared Memories
This paper is primarily concerned with theoretical evaluation of the performance of multiprocessors system. A markovian waiting line model has been developed for various different multi-processors configurations, with shared memory. The system is analysed at the request level rather than job level.
متن کاملResource Management and Scheduling on SupernodeII
In recent decades, the demands of using computer to solve grand and challenging problems grow both in size and in complexity. Distributed and parallel computing is thus important. Enabling technologies in highspeed communication today have made PC-based clusters become a mainstream of parallel and distributed platforms for high-performance, high-throughput and high-availability computing. To en...
متن کاملPlacement of Objects in Parallel Object Based Systems
Parallelism is a viable solution to constructing high performance object oriented database systems This paper analyzes the role of parallelism in such systems In parallel systems based on a shared nothing architecture the database is horizontally declustered across multiple processors enabling the system to employ multiple processors to speedup the execution time of queries and improve throughp...
متن کاملPerformance Comparison of Common Server Hardware Virtualization Solutions Regarding the Network Throughput of Virtualized Systems
Hosting virtual servers on a shared physical hardware by means of hardware virtualization is common use at data centers, web hosters, and research facilities. All virtualization platforms include isolation techniques that restrict resource consumption of the virtual guest systems. Therefore, they setup quotas for the virtual guest systems on the use of processors, memory, and hard disk of the h...
متن کاملDesigning and Dismounting an Intelligent System of Irrigation Management for Greenhouse based on Delphi Software
The drought continuity and also restricting watery sources caused agriculture section forgetold flooding methods for optimum water exploitation and proceeding new irrigation systems.New generation of irrigation systems called intelligent systems is a new solution leading toexploiting water increase to higher than 80%. In order to measure sensors and to controlprocessors in designing and dismoun...
متن کامل